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Abstract. We propose a generative model of a group EEG analysis, 
based on appropriate kernel assumptions on EEG data. We derive the 
variational inference update rule using various approximation techniques. 
The proposed model outperforms the current state-of-the-art algorithms 
in terms of common pattern extraction. The validity of the proposed 
model is tested on the BCI competition dataset. 

1 Introduction 

Electroencephalography (EEG) is a multivariate time-series recording of electri- 
cal potentials induced by ionic flows among neurons in the brain. Since EEG has 
the highest temporal resolution among other non-invasive brain imaging tech- 
niques, it is widely used in the brain computer interface (BCI) research, especially 
on the applications where realtime capability is required, such as controlling a 
computer cursor [M] , mobile robots jTT] , wheelchair j5)13| , and a humanoid robot 

m 

There have been many approaches to classify mental state based on the pre- 
processed EEG signals. They include SVM [5], LI regularized logistic regression 
[B], and nonnegative matrix factorization (NMF) [S]. According to 91, NMF 
based methods do not require any cross-validation in determining basis vectors 
which contain useful spectral traits in motor imagery EEG signals. 

For each mental state, brain images consist of subject-dependent patterns and 
common patterns shared across multiple subjects. Most approaches proposed 
in the literature had not considered the latter. Since those methods can not 
capture common features occurring across all subjects, a pilot training phase 
is always required whenever a new subject comes to the system. To deal with 
this limitation, group-NMF [TUI (GNMF) was proposed by modifying the cost 
functions of the standard NMF. The advantage of group analysis of EEG is 
twofold. First, it finds common patterns that can be used in the testing phase of 
other subjects without each pilot test, and second, it finds individual patterns 
that reflect intra-subject variability. 

Most NMF algorithms, including GNMF, are based on optimization of the 
cost function under some constraints on the variables. Although non- generative 
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models give more accurate results in general, we can not incorporate prior knowl- 
edge into them. It is well known that EEG data can be well represented with 
exponential distribution '15] , but there is no mean to exploit this valuable infor- 
mation in the non-generative models of NMF. In addition, the non-generative 
models is not robust to the small size of data, while generative models are capa- 
ble of embedding prior knowledge, and competitive performance can be achieved 
with little data. 

With this motivation, we devise a generative mode of group EEG analysis, 
based on Bayesian nonnegative matrix factorization. We derive the variational 
inference update rule using various approximation techniques. The validity of 
the proposed model is tested on the BCI competition III dataset 

2 Model Description 

We use preprocessed EEG signals applied power spectral density to have data 
matrix X £ /j'xmx" for each subject I. Each m dimension in X represents fre- 
quency bin, and each n dimension is associated with time stamp. In general, 
the NMF [12] finds a decomposition represented as X = AS. However, we as- 
sume two kinds of base matrices. One is common base matrix, Ac G i?™^*^. It 
reflects activated regions and frequency kinds for a specific task class. And the 
other one is individual base, Aj G jiixmxj ^ rpj^^ individual patterns vary de- 
pending on each subject, even though the task is the same. Hence we model X 
as Xmni = Ylk=i(.'^c)mk{Sc)ikn + J2j=ii^i)hnj{Si)ijn) , whcre Sq represents 
class indicator, and Sj mixes individual factors appropriately. It is well known 
that EEG data can be well represented with exponential distribution [IS] , so we 
can construct a generative process as follows: 

K J 
Xrani\Ac, (A/),, {Sc)h {Si)i ^ Exponential(^ (Ac)mfc {Sc)lkn + 'y^X^l)lmj{Sl)ljn) 

k=l 3=1 

{Ac)mk\a ^ Gamma(a,a) 
{Ai)„iki\h ^ Gamma(6,6) 

{Sc)ikn = l{Y,n = fc) 
{Si)ijn\Yin,c^ Gamma(cY-,„,CY,„) (1) 

The graphical model for this is shown in Fig. [l] We assume gamma distribu- 
tion for all priors, because we can have mathematical advantages of the inference 
algorithm, which is shown in Gap-NMF [7]. We design Ac to have class specific 
image. Hence, we assume the number of common bases, K, is the same with 
the number of classes. The individual bases are designed to be dependent on a 
subject and a task class. 

At the training phase, both X and Y are used as dataset to predict posterior 
of Ac,Ai, while X is the only available data in the testing (Note that we use 
the posterior of Ac,Aj predicted in the training phase). For a given estimated 
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Fig. 1. Graphical Model for Bayesian NMF 



posterior of Ac,Aj, and test data X*^**, we predict class label y*^** as 
y,*-* = argmaxllX*-* - x}j^, x\ = (Ac)^" 

k 

3 Variational Inference 

There are two kinds of inference techniques in Bayesian graphical models, Markov 
Chain Monte Carlo (MCMC) and variational inference. Although MCMC is 
simple and easy to implement, it suffers from slow converge speed and no con- 
vergence guarantees. Therefore we derive variational inference algorithm of the 
proposed model. 

We derive variational inference algorithms using the similar technique as 
introduced Gap-NMF [7] model. A typical mean-field variational inference uses 
the same distribution family as a variational distribution for each variable, but 
Hoffman et al. |7j showed that using Generalized Inverse-Gaussian (GIG) family 
[8] as a variational distribution gives tighter bound. Therefore we use GIG to 
approximate q{Ac) and q{Aj). 

3.1 Lower Bound of the Marginal Likelihood 

We can derive the lower bound of the marginal likelihood of X after we factorize 
each variable fully. 



\ogp{X\a,b,c,Y) > E,[\ogp{X\Ac,Sc,Aj,Si)] 

+ Eg[logpiAc\a)] + Eg[\ogp(Ai\b)] + Eg[\ogpiSi\c, Y)] 
- Epilog qiAc)] - E,[\ogq{Ai)] - E,[\ogq{Si)] (2) 



The first term of the bound in ^ can be expanded to ([s]). 
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E,[logp{X\Ac,Sc,Aj,Sj)]^Y.iEJ (A ) (S)^T^(A) (S) ^ 

(3) 

{Sc)lkn + 'y^,{^l)lmj{Sl)ljn)]} 
k j 

The first term in ^ can be approximated using Jensen's inequality, because 
— {■)~^ is a concave function. 

J2ki^c)mk{Sc)lkn + J2ji^l)lmj{Sl)ljn 

^ \Oc)lkn \Ac)mk {^I )lmj{JI )ljn 



^l + T^lnm2 — l-i <Plmnk — Ij '4'lmnj — 1 



And for the second term in (|3]), we use the same method in [J], which gets 
lower bound of the convex function, — logx, using a first order Taylor approxi- 
mation. 



E,[-\0g{Y,{Ac)mk{Sc)lkn + Y.^Ai)i„,,{Si)i,r,)] (5) 
k j 

> - \0gWlmn + 1 '^—C^Eq[[Ac)mk]{Sc)lkn Eq[[Ai)i„^j{S l)ljn]) 

k .7 



3.2 Optimization 

We present the optimization algorithm that maximizes the lower bound in ([2]), 
and it gives the approximated p{Ac) and p{Aj) through Q{Ac) and Q{Aj). 

To optimize (f), if), and tt, we use Lagrange multipliers with sum-to-one con- 
straints. 



(klmnk Cti {Sc)lknEq[ ] , Vimnj « Eq[—— — — ] 

\Ac)mk \Ai)imj{oi)ijn 

_ ^L«j-^g[(Aj)„J(Si),j„] ,p. 

T^lmnl ~ sr^ 19 jp [ 1 1 i / 2 p [ 1 1 ^ ' 

l^k Vmnk^l I (Ac)„fc(Sc)lfc„ -I + Vlvanj^q [ (Aj (Sj -I 

And for the inference of w and other variational parameters, we use coordi- 
nate ascent algorithm to maximize the bound. 
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Wlr, 



- Eq[{Ac)mk(Sc)lkn] + Eq[{Aj)i„ij{Sl)ljn 



, Wlmn , , [^C)lkn 

Wlmn „ y^IJljn 



Eq[{Ai)imj], ^Si)„„ = ^{'^'lnin2Xlnin{'4'lnnjEq[jJ-^ ])} 

m Wllmn ^ [^IJlrnj 

(7) 



4 Experiments 

We demonstrate the proposed model on the real EEG dataset. We compare our 
model to GNMF, the only group analysis model in the literature. The perfor- 
mance measure is the classification accuracy. Throughout the experiments we 
set all hyper parameters (a, b, ci, C2.C3) to 0.1, the number of common parameter 
to 3, and the number of individual parameter to 1. 

4.1 IDIAP Dataset 

The IDIAP Dataset [3] is comprised of precomputed features of EEG recorded 
from three subjects. Each subjects were asked to perform one of the three tasks 
for some duration of time. The tasks include imagination of left or right hand 
movements and generation of words beginning with the same random letter. 

The preprocessing of the raw EEG is done by spatial filtering and power 
spectral density (PSD). The raw EEG has 8 centro-parietal channels, and PSD 
uses 12 frequency bins at every 62.5 ms, which constitutes the 96 dimensional 
feature vector. 

4.2 Common and Individual Factor Extraction 

In neuroimaging, discovering subject independent patterns for a specific task 
is desirable, but intra subject variability often thwarts seeking them. If we can 
separate the two kinds of patterns, then the common activation patterns would 
be more clearly visible. In Fig. [2] we show a side-by-side comparison of the re- 
sults of GNMF (Fig. 2(a)) and our proposed model (Fig. 2(b)[ ) in terms of the 



separation of the common and individual bases. The common bases found by 
our model are in fact common patterns shared by all three subjects, whereas 
the common bases found by GNMF are not quite the same across the subjects. 
This shows that our model is better able to separate the common patterns from 
the individual patterns. Additionally, according to the results of the BCI com- 
petition, subject 1 showed the best performance, indicating that he was able to 
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concentrate better on the task than subjects 2 and 3. Hence, we expect the in- 
dividual pattern of subject 1 to be clearer (more concentrated in a small region) 
than those of subjects 2 and 3. The rightmost column of Fig. 2(b) shows the 
concentrated pattern around a small region for subject 1 and less concentrated 
patterns for subject 2 and subject3. On the other hand, GNMF does not reflect 
this individual performance difference in the individual bases in the rightmost 
column of in Fig. 



2(a) 





(a) Inference of bases of GNMF (b) Inference of bases of the pro- 
posed model 



Fig. 2. According to the result of the BCI competition III, the best performance was 
achieved in subject 1, which means he or she is less distracted. Likewise, the subject 
3 is more distracted than subject 2. This fact is well reflected in the proposed model, 
(b) 



4.3 Sensitivity of the Training Data Size 
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Fig. 3. Performance comparison under various training data size for each subject 



In general, the performance of a Bayesian graphical model is less sensitive 
to the size of training data because it can take advantage of the prior. The 
proposed model inherits this advantage, so the performance is robust to the size 
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of the training data. In practice, a smaller training dataset is desirable if it can 
achieve comparable performance because gathering of training data often costs 
time and money. Fig.|3]sho"ws such robustness of our model. Note that our model 
performs well with only the common bases (except the subject 3). This shows 
that while our model captures the common patterns well, it does not capture the 
individual patterns well. This shows the limitation of our model in its current 
form and shows potential for better performance once the model can also capture 
the individual variability. 

5 Conclusion 

We presented a generative model for analyzing group EEG data. The proposed 
models finds common patterns for a specific task class across all subjects as well 
as individual patterns that capture intra-subject variability. The proposed model 
seems to capture the common patterns better than previously proposed group 
NMF model, and it seems less sensitive to the size of the training data because 
it is a generative model. However, the limitation of the model is that it does 
not model the individual variability well, and that is left for future research. We 
believe that better modeling the individual variability, combined with the good 
performance for common pattern discovery, will result in an overall improved 
model. 
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